Mean-value identities as an opportunity for Monte Carlo error reduction 
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In the Monte Carlo simulation of both Lattice field-theories and of models of Statistical Mechanics, 
identities verified by exact mean- values such as Schwinger-Dyson equations, Guerra relations, Callen 
identities, etc., provide well known and sensitive tests of thermalization bias as well as checks of 
pseudo random number generators. We point out that they can be further exploited as control 
variates to reduce statistical errors. The strategy is general, very simple, and almost costless in 
CPU time. The method is demonstrated in the two dimensional Ising model at criticality, where 
the CPU gain factor lies between 2 and 4. 

PACS numbers: 05.50.-|-q. 02.70.-c 75.40.Mg, 



I. INTRODUCTION 

Monte Carlo simulation [1, 2] is one of the handful 
of general methods in the theoretical physicist's toolbox 
that can be applied to nonperturbative problems. In 
spite of this, it is a very inefficient method: the computa- 
tional effort needed to get yet another decimal significant 
figure grows by a factor of 100. 

Yet, there are alternatives to brute force when more ac- 
curacy is needed. A classical strategy consists in looking 
for statistical estimators of the sought quantities, which 
have the same expectation value as the commonly used 
naive estimators, but a reduced variance. The multihit 
method [3] (and later developments [4]) for the Polyakov 
loop in Lattice QCD is a conspicuous example of such an 
improvement. Now, the numerical error is proportional 
to the square root of variance for the considered estima- 
tor. It follows that reducing the variance by a factor of 
two reduces as well in the same factor the numerical effort 
needed to achieve the desired statistical accuracy. Even a 
modest factor of variance reduction can be a significant 
improvement: the CPU time needed in application to 
Lattice Gauge Theory or to Condensed Matter Physics 
(think for instance of Spin-Glass simulations [5]) oftenly 
lies in the range 10-10^ processor years. 

Here we propose a general road to variance-reduction 
based in known identities between exact mean- values. In 
spite of its usefulness, this strategy, known as control 
variates in the mathematical literature [6, 7], is still not 
commonly used in the framework of Monte Carlo simu- 
lations in Physics (at the practical level, it requires only 
standard Monte Carlo data- analysis tools). In fact, it is 
fairly common to find in Field Theory or in Statistical 
Mechanics that a particular linear combination of non- 
trivial expectations values vanishes exactly (we provide 
below specific examples). There are different ways of 
finding such identities: Schwinger-Dyson equations ex- 
ploit invariances of the integration measure [8]; Callen 
identities are derived by integrating in the functional 
integral some variable while holding fixed all others [9] 
(multihit operators [3, 4] belong to this category); Guerra 



relations are somehow specific to disordered systems [10]; 
in models where a cluster method works [11] cluster esti- 
mators with the same expectation value than their spin 
counterparts can be found (see e.g. [12] and references 
therein). It is fair to say that, for any problem amenable 
to a path-integral formulation, each of the above strate- 
gies will provide at least one identity: the vanishing of a 
precise linear combination of expectation values of non- 
trivial observables. 

Researchers performing Monte Carlo simulations are 
acutely aware of the advantages provided by mean- value 
identities. If the numerically obtained expectation values 
do not verify them within errors, this will most probably 
be due to a thermalization bias [13, 14] or to a failure of 
the used Pseudo Random Number Generator [15] (or to 
a programming bug!). We remark here that mean- value 
identities provide as well statistical estimators with re- 
duced variance. The method is exemplified in the stan- 
dard benchmark of the two-dimensional Ising model at 
its critical point. 

We note finally that in previous work [16, 17] covari- 
ance error-reduction was presented for the Finite-Size 
Scaling analysis of phase transitions. Indeed, covariance 
analysis improves the computation of the critical temper- 
ature and the leading scaling-corrections exponent from 
data on finite lattices [16]. It provides as well the op- 
timal combination of different estimates of the sought 
critical exponent (each individual estimate being previ- 
ously extrapolated to infinite volume) [17]. As we discuss 
in Sect. II D, covariance error reduction (specially as pre- 
sented in Reference [17]) is a particular case of the present 
approach. 

The layout of the rest of this note is as follows. In 
Sect. II we recall the error reduction strategy in a gen- 
eral setting (without reference to any specific model). 
The reader merely interested in a practical recipe, may 
proceed directly to Sect. II C. In Sect. Ill, we briefly 
describe the model and the observables, as well as the 
used mean-value identities. We present our numerical 
results in Sect. IV while our conclusions are in Sect. V. 
In Appendix A we present some technical results which 
are specific for the Swendsen-Wang cluster algorithm as 
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applied to the Ising model. 



II. COVARIANCE ERROR REDUCTION 

We first discuss the problem as if the exact covariance 
matrix was accessible (Sect. II A). The effects of time 
correlations are described in Sect. II B. Real life com- 
pUc;ations arise from the fact that the eovarianee matrix 
needs to be estimated from a finite sample of Monte Carlo 
data, which fortunately does not induce any significant 
bias, Sect. TIC. Finally, we discuss in Sect. II D how the 
general approach relates with the problem of finding the 
optimal linear combination of several estimates for the 
very same expectation value. We discuss as well some of 
the very counterintuitive features of this problem. 

A. The minimal error 

Let A, Bi, B2,...Bji be stochastic variables. We 
assume that a set of mean-value identities appropri- 
ate for the problem at hand tell us that (Bi) = for 
i = 1, 2, ... i?. We assume as well that {A"^) and all the 
(Bf) are finite. We wish to profit from the covariance 
between A and the Bi to obtain the best determination 
(in the sense of minimal variance) of (A) . 

Before going on, it is useful to note that the operation 
of computing the covariance between real- valued stochas- 
tic variables X and Y, 



axY ^ {{X - {X))iY -{¥))) , 



(1) 



has the structure of a scalar product. Indeed the four 
following properties are easy to establish (i) it is symmet- 
ric, (7xY = o'YX, (ii) it is linear on each of its arguments, 
<^x{XiYi+X2Y2) = ^if^XYi + A2crxY2> (iii) <^xx > 0, and 
(iv) if {X) = and axx = it follows that X = Q with 
probability one. For later use, we introduce the correla- 
tion coefficient between X and Y 



rxY 



OXY 



\/crxx(^YY 



(2) 



Using the B^, it is straightforward to define stochastic 
variables with expectation value {A): 



i(Ai, As, . . . , Aij) = A -h ^ AiSi . 



(3) 



i=l 



Our task is to find the coefficients {A*}^i that minimize 
the A variance 



that has a minimum at 
R 



X* 



E 



'(^AB„ , 



(4) 



(5) 



In the following, we will denote the optimal random 

variable as 



A* = A{Xl,...,X*2) 



(6) 



whose variance is 



R 



aA'A' =<yAA — ^ crABi(^ ^)ii'0'A_B;, (7) 

Note that rescaling any of the Bi, Bi — > ctiBi, would 
leave A* unchanged. For R=l, Eq. (7) reads 



<yA*A* = o'aa(1 — r\g) . 



(8) 



In particular, whether A and B are correlated or anticor- 
related is immaterial. 

In a nutshell, we face a standard problem of best ap- 
proximation in an Euclidean space: we are decomposing 
the fluctuating part oi A, A — (A), on its components 
parallel and orthogonal with respect to the linear space 
generated by {Bi}fL^. The best approximation. A*, is 
found when the parallel component is made to vanish. 
The minimal variance is the norm squared of the orthog- 
onal component. If we compute in a Monte Carlo sim- 
ulation A* rather than A, we are rewarded with a CPU 
gain factor of aAA/f^A'A*- 



B. Covariance and time correlations 

The stochastic variables X,Y,Z,... considered in 
Sect. II C are actually Monte Carlo time averages. 

Indeed, the Monte Carlo dynamics can be regarded 
as a Markovian random- walk in configuration space [2]. 
Let be one of such spin (or gauge-field) configura- 
tions, and 6*4=0 J Ot=i, ... be the time sequence of con- 
figurations visited by the random-walker. We consider 
functions of the fields configuration X,y,Z,... (observ- 
ables, hereafter), and use the shorthand = X[0{t)], 
t = 0,1, . . . ,T — 1. Hence, our stochastic variable X will 
be (and similarly for Y, Z,. . . ) 



T-l 



(9) 



t=o 



The Markovian random- walk in configuration space, is 
fully determined by a transition matrix, Ve^^^e^, namely 
the conditional probability of reaching Ot+\ from Ot in 
a single step. The transition matrix verifies the balance 
condition, with respect to the equilibrium distribution 
function irlO] 



(10) 



In this work, we shall always consider that, at f = 0, equi- 
librium has been already reached. Thus, the expectation 
value for X is the Boltzmann average for X . 
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It is convenient to consider the equilibrium (sym- 
metrized) time correlation function for two real observ- 
ables, X and y {autocorrelation if X = y) 

Cxy{t) = ^(^-(0)^') + A'(')yo)) - {X){y) . (11) 

Note that Cxy{t) = Cyx{t) = Cxy{—t), and that it is 
bilinear in X and y. We will call Cxy(0) static covari- 
ance, since it can be computed from equal-time expec- 
tation values. Cxy{t) allows to compute axY, since one 
straightforwardly obtains from (9) that 



T-l 



crxY 



1. J2 Cxyit'-t) 



(12) 



t,t'=0 



We define the integrated correlation time {autocorrela- 
tion time, Tint, A', ii X = y) as 



nnt,xy = 



Et=c 
t=- 



Cxy{i) 



2^Cxx{0)Cyy{Q) 



(13) 



Now, a standard argument [2] tells us that, 
Y^'tLi * I^A'3;(i)| < oo, the covariance of X and Y is 



if 



2Ti 



crxY 



mt,xy 



^Cxx{())Cyy{0) 



+ 0{T-^). (14) 



For instance, the rAB in Eq.(8) is just 



rAB 



(15) 



Hence, the cffcctivcncs of a particular control variatc, B, 
does depend on the autocorrelation and correlation times 
of the chosen Monte Carlo algorithm [2 7] . 

We finally recall some well known results [2]. Cxy{t) 
can be computed from the t-th power of the transition 
matrix and the equilibrium distribution as 

Cxy{t) = J2 l('^my[0t] + x[Ot]y[eo]){i6) 

x([P]^Uo-M0t])M0o]. 

At this point, an analogy with Quantum Mechanics is 
in order. Up to now, we have been working in the 
Schrodinger picture, where the probabilities evolve in 
time while the operators remain constant. Yet, it is 
best to work in an equivalent Heisenberg picture where 
only observables evolve in time. We define a time- 
transformation. P, that transforms observable X in ob- 
servable PX. The value taken by PX for configuration 
is a conditional expectation value 

px[o\ = E{x[et+i\\et = e) = ^ A-ie'jpe^e . (i?) 

0' 

Mind that, if the Monte Carlo dynamics is composed 

of consecutive steps (in the Swendsen-Wang dynam- 
ics, for instance, one first update the bonds, then the 



spins: V — 'PspinT'bond), thc evolution operators in 
the Heisenberg picture appear in reversed order (e.g. 



-Pbond-^spin) 



We introduce 



equal time real observables {X,y) 
correlation function is 



a scalar product for 
= {X{t)y{t)). The 



Cxy{t) = 



{x,p\'\y) + {p\'\x,y) 



{x){y). (18) 



Thus the problem of computing correlation times is re- 
duced to the spectral analysis of the operator P. 



C. Practical recipes 

In a Monte Carlo calculation, thc stochastic variables 
A and Bi discussed in Sect. HA are directly related to 
some functions of the spin (or gauge field) configuration, 
A. Bj, i = 1, 2, . . . , i?. One stores in disk T consecutive 
measurements of these functions {-A*-'-* , bJ'' , . . . , ■ 
We assume that autocorrelation times (13) for these mea- 
surements are finite. Their Monte Carlo average 



(t) 



1,2,. ..,R. 



(19) 

are just instances (i.e. disorder realizations) of the ran- 
dom variables A and Bi. 

Let us form N data blocks {Aj,Bij}jL.^ by 
averaging sets of T/N consecutive measurements 
. . . ,6^^} . The basic assumption underlying 
thc Monte Carlo error analysis [18] is that, provided 
that T/N is large enough as compared to Monte Carlo 
autocorrelation times, the {Aj, Bij}jLi are identically 
distributed, and statistically independent for different j. 
Furthermore, one assumes that T/N is so large, that the 
blocked data are not only independent, but also Gaussian 
distributed: 



Aj = {A) + r]f VNaAA , 



(20) 



Bij = vfW^^B.B,, i = l,2,...,R. 



The ?7 are Gaussian random numbers, with zero mean 
and covariance matrix 



= Si 



Sjj'rABi , 
Sjj'rBiB,, 



(21) 



where 6jj' is Kronecker's delta. Note as well that one 
gets exactly the same numbers for A and Bi either by 
averaging over j the {Aj,Bij}, or using Eq. (19). For 
later use, we define also the jackknife blocks (see e.g. [18]) 



Af 



NA - Aj 
N-1 

NBl - Bi 
N-1 



(22) 



,i = 1,2, 



.,R. 
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Our statistical estimators for the covariances are 

N 



E 

N 

E 



N{N-1) 



jr{ NI{N-l) ' 
j^{A, -A){B,,, -B,) 

N 

E 



N 

E 

JV 

E 



7V(Ar- 1) 






■Bi) 


N/{N - 1) 




(Bij - Bi){Bi>j - 




N{N -1) 




{Bi^ -B,){Bi}f^ 


-Bi,) 



(23) 



7V/(A/-- 1) 



At variance with the numbers uaa, <^ABi or OBiBj, our 
estimators gaa, '^AbI or aSiB-, , are random variables. It 
is straightforward to show that their expectation values 
are the sought covariances, but they are subject to sta- 
tistical errors whose (relative) size is of order . In 
fact, since one needs to keep the data-block size T/N as 
large as possible to ensure the correctness of Eq. (20), 
the typical number of blocks is kept low, say N ^ 100 . 
Incidentally, the second equality in each one of Eqs. (23) 
is an algebraic one: we get the same numerical covariance 
estimates from the standard or the jackknife blocks. 

At this point, we may trade the unaccessible minimiza- 
tion Eqs. (5,6) by the computable 



A* 



E )H'^AB^, Bi , Sii, = UBiB^i ■ (24) 



The very same procedure is performed block by block, 
thus obtaining {A*}jLi. Errors are computed in a stan- 
dard way from these blocks. 

The reader might question the validity of Eq. (24), 
because the vanishing of (Bi) does not imply 
(I2t'{^ )ii''^AB^Bi) = 0. This is specially worrying 
since, as we said above, the relative errors for oab" 
aBiB-, are 10% in real-life calculations. The way-out is 
in Eqs. (20,21). If in a particular simulation one finds the 



Gaussian fluctuations {?7^,?7^^ 



,vf'}jLi, the sign- 



reversed fluctuations ^Vj'^ ; • • • ; ^vf^ are just 

as probable. One immediately notices that the covariance 
estimators, Eqs. (23), are invariant under sign-reversal of 
fluctuations. This means that aABi, the matrix S and 
its inverse arc also invariant, while the Bi transform to 
—Bi. Hence, if the probability distribution function of 
{rjf, r]f^ , . . . , T]?^ }f=i is invariant under sign-reversal, it 
follows that the expectation value for A* in Eq. (24) is 
still {A) (according to Rubinstein [7], this fact was first 



noticed for the particular case of Gaussian distributed 
fluctuations in [19]). However, even in the absence of 
sign-reversal invariance, the bias induced is of order 1/T 

while the statistical error is of order 1 / Vt. 

As for functions of expectation values, let us explain 
the procedure by considering the second moment cor- 
relation length Eq. (30), that depends on the expecta- 
tion values of two variables, m(0) and TO(fcmin)- One 
first transforms using Eq. (24) the estimates and the 
jackknife blocks of each of the needed quantities, e.g. 

m*{k^i^) and {mf'*{0),mf'*{k^i^)}^^,. Then 
we use Eq. (30) to obtain our best estimate of the 

correlation length from m*(0), m*{k^i^). To estimate 
the errors, we first form N jackknife blocks by com- 
puting the correlation length from each of the A'' pairs 
{mj^'*(0), mj^'*(fcrnin)}, then use the standard formu- 
lae [18]. 



D. Several observables with same expectation value 

Given a set of random variables Ai,A2,... Ar+i with 
a common expectation value, {Ai) = a, one may won- 
der how to get the best possible estimate of a. This was 
precisely the case considered in [16, 17]. We only discuss 
here the relationship with the (closer in spirit) approach 
of [17], where the Ai were estimates of the critical ex- 
ponent v for an Ising model at its critical point. The 
obvious way of addressing the problem is considering a 
linear combination 



n.+i 



MPi'P2,---,PR+i) = ^PiAi , ^Pi = l, (25) 



i=l 



i=l 



then minimizing cr^^. This is a particular case of the 
optimization problem that we have already discussed at 
length in Sects. II A and II C. In fact, note that Pr+i = 
l—p\—p2 — ...—pR and then, keeping an eye on Eq. (3), 
write A = Ar+i, {Ai = pt , B^ = A, - Ar+i}^^. 

However, this optimization problem produced some 
counterintuitive results [17]. All five computed u esti- 
mates for the two-dimensional Ising model lied above the 
exact value. In spite of this, the improved estimate was 
below the exact value. This apparent paradox can be eas- 
ily explained in our language, by considering the simpler 
case R — 1, (so we have Ai and A2). Using the results 
reviewed in Sect. II A one easily finds that the minimal 
squared error is 



CTA'A* 



CAiAi + 0-^2^2 — '^rAiA2^/(^AiAi<^A2A', 



, (26) 



Hence, if rAiA2 tends to one and if aA^Ai 7^ ^'^2^2 an 
error-less estimator exists. In fact, in the rAiA2 1 limit 
we have Ai = a + rj^aA^Ai, A2 = a + rjy^aj^ with 77 
the same Gaussian random- number for both variables (of 
course (77) = and (77^) = 1). In other words, if for a 
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particular simulation Ai lies bclo"w(above) a, the same 
will be true for In spite of this, if we write pAi + 
(1 - p)A2 = g + vi^/^MM + v{-sf^MM - a/^aUT)] and 
set p = ^0-^2^2 /(V'^AiAi - ^cTAjAa), an exact answer 
is found. Note, however, that the problem becomes ill 
conditioned when oa^Ax approaches cr^aAa- In fact, if 
the two variance coincide we gain nothing by considering 
Ai in addition to A\, since in this case one would have 
A\ = A2 with probability one. 



III. MODEL, OBSERVABLES, MEAN- VALUE 
IDENTITIES 

We shall put to work the strategy in Sect. II, in the 
standard benchmark, the Ising model in two dimensions, 
for which many exact results exist, including exact com- 
putations of some quantities in finite-systems [20] that 
can be directly confronted with the Monte Carlo simula- 
tion. 

The spins iSj are placed in the nodes of a square lat- 
tice of side L with periodic boundary conditions. The 
interaction is restricted to lattice nearest neighbors, the 
partition function being : summation over the 2^ 

spin configurations): 



Z = ^ exp 



llx-irii=i 



(27) 



The system undergoes a second order phase transition at 
Kc = log(l + V2)/2. 

The main functions of the spins that we are considering 
are the energy, and the Fourier transform of the spin 
field at zero and minimal momenta (fc = (0, 0) or kmin = 
(27r/L,0)) 



1 



e = 



SsSy-, m{k) = j^J^^s-e'"-^ . (28) 



From m{k) we define the magnetic susceptibility 

X = L^([m(0)n, (29) 

the second moment correlation length [21] (we gain 
statistics by averaging [m(^min)]^ over (27r/L,0) and 
(0,27r/L)) 



[m(0)]2) - ([m(fc^i„)]2) 
4sin2f ([m(Lin)]2) 



and the Renormalization-Group invariant ratio 

[m(0)]4) 



U4 



(30) 



(31) 



Our first mean-value identity comes from the Fortuin- 
Kasteleyn formulation (see e.g. [2, 18] for details). Given 



a decomposition of the lattice in Af connected compo- 
nents (clusters), each containing ric spins, it is easy to 
show that (see Appendix A for a quick review) 



Hence, our first control variate is 



Bsw = [m(0)]2 - ^ 



(32) 



(33) 



A second control variate comes from a Callen iden- 
tity [9]. Let the local field acting over site x be 



hx — ^ Sff- 

llx-irii=i 



(34) 



Then, if >1, 

(SsSff) = (tanh(K/ij) ta,nh{Khg)) . (35) 

Hence, 

Bci = -p ^ [tanh(«;/ij)tanh(/t/ijf) - S^Sy] , (36) 

||S-y||>l 

that can be computed with 0{L'^) operations as 



Bex = ^ 



(37) 



tanh(K/is) j - ^X^'^^j 



-^[tanh2(K%) - 1] 



— ^ [tanh(K;/i^) tanh(K;/ij^) — S^Sg] 

\\x-y\\ = l 



Finally, a Schwinger- Dyson equation [15] provides a 
third control variate 



IV. RESULTS 



(38) 



We have simulated the model on systems L = 16, 128 
and 512 using the Swendsen-Wang algorithm [11]. For 
each lattice size, we traced clusters 10^ times taking mea- 
surements each time that the clusters were traced. We 
discarded the first 10% of measurements for thermaliza- 
tion (which, on the view of the autocorrelation times for 
this model and algorithm [18], is extremely conservative), 
hence formed N = 100 data-blocks of 9000 measurements 
each (we expect to be well in the Gaussian fluctuations 
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16 

standard 
cluster 

Bsw improved 
Bci improved 
Bsw & Bci improved 
others 



(e) 

1.45339(47) 

1.45334(32) 
1.45316(24) 
1.45319(18) 
1.453065. . . [20] 



X 

139.719(155) 

139.713(127) 

139.700(93) 

139.652(104) 

139.666(73) 

139.546(77) [22] 



14.601(31) 

14.597(19) 
14.590(25) 
14.594(18) 
14.566(14) [22] 



Ih 

1.16502(74) 

1.16510(51) 
1.16524(63) 
1.16517(50) 
1.16586(34) [12] 



128 

standard 
cluster 

-Bsw improved 

Bci improved 

Bsw & Bci improved 

others 



1.419052(100) 

1.419101(94) 
1.419047(79) 
1.419095(66) 
1.419076. . . [20] 



5316.6(76) 
5317.7(70) 
5321.7(60) 
5316.4(68) 
5321.4(51) 
5318.1(28) [22] 



115.77(28) 

115.97(21) 
115.77(26) 
115.96(19) 
115.81(13) [22] 



1.16789(89) 

1.16735(75) 
1.16791(86) 
1.16736(71) 
1.16763(32) [12] 



512 

standard 
cluster 

Bsw improved 
Bci improved 
Bsw & Bci improved 
others 



1.415407(36) 

1.415397(34) 
1.415429(26) 
1.415421(24) 
1.415429. . . [20] 



60180(94) 
60168(88) 
60134(80) 
60230(78) 
60183(62) 
60209(34) [22] 



463.62(115) 

462.99(92) 
464.14(101) 
463.51(76) 
463.82(51) [22] 



1.16809(89) 

1.16852(76) 
1.16768(78) 
1.16812(64) 
1.16782(30) [12] 



TABLE I: Comparison of numerical results for the quantities defined in Eqs. (28,29,30,31), namely the internal energy, the 
magnetic susceptibility, the correlation length and the dimensionless ratio U4, as obtained in the two dimensional Ising model at 

its critical point, for different lattice sizes. For the susceptibility wo show also the cluster estimate, Eq. (32), that improves less 
than a 20% in terms of CPU time over the standard estimator Eq. (29). In contrast, the covariaiicc improved estimates obtained 
from the mean- value identities Ecjs. (33,36) do save more than a factor 2 in computer cost. To check for the possibility of a bias 
induced by the covariance error-reduction, we compare also with exact results (for the internal energy) or with independent 
and longer Monte Carlo simulations. 



regime). The jackknife error was used throughout for er- 
ror computations. The used programs were minor modi- 
fications of the sample programs in [18]. 

As in section II C, we name Bi {i = SW,CI,SD) the 

block average of consecutive Monte Carlo measurements 
of Bi. The results of the analysis using i?sw and/or Bci 
as control variates are shown in Table I. We detect no 
bias when comparing with exact results or with previ- 
ously published (and more precise) computations. When 
using the two control variates together, a CPU factor 
gain larger than two is achieved for x, (e) and ^, for all 
values of L. This gain is largest for L = 16 and deterio- 
rates somewhat in going to L = 128, but then stabilizes 
and does not significantly deteriorate further when going 
to L = 512. For instance, for L = 512 the CPU gain in 
the computation of the susceptibility is a factor 2.3 when 
comparing with the standard spin estimate [Eq. (29)] or 
a factor 2.0 when comparing with the cluster estimate 
[Eq. (32)]. 

Rather smaller gains are obtained by using Bgw 
and/or Bci individually: for instance, in the x compu- 
tation using Bsw alone, the CPU gain factor is 2.8 for 
L = 16, but it deteriorates to 1.56 for L = 128 and 1.42 



for L = 512. The fact that wc do significantly better 
by combining the two control variates (rather than using 
only one of them) suggests that the orthogonal compo- 
nent of Bci with respect to Bsw is sizeable (and that 
this component still strongly correlates with the squared 
magnetization). 

There are some interesting issues regarding the useful- 
ness of Bsw as a control variate for x- This is an instance 
of the problem considered in Sect. II D: we are after the 
optimal linear combination between Eqs. (29,32). In Ap- 
pendix A we show that the optimal choice is very close 
to the cluster-based susceptibility, Eq. (32) (the opti- 
mum is exactly (32) if successive measurements arc sepa- 
rated by a time interval of many autocorrelation times, so 
that they are essentially statistically independent). This 
statement can be reworded as the use of the spin-based 
susceptibility via the control variate Bsw barely improves 
the cluster-based susceptibility (the usefulnes of -Bsw de- 
creases with growing autocorrelation times, Eq. (A17)). 

A related, yet different, issue is the temperature evolu- 
tion of the efRcency of the cluster estimator for x- At Kc, 

Table I, errors for the spin and cluster based estimates 
are similar. This is in marked contrast with the situation 
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in the paramagnetic scaling region [k < k^, 1 ^ ^ ^ L), 
see e.g. [22]. In Eq. (A16), we give the (squared) ratio 
of statistical errors for the two estimators in terms of an 
autocorrelation time and of several expectation values of 
the static cluster-sizes distribution. At Kc, a giant clus- 
ter dominates sums such as that in (32), see Table II 
in the Appendix. As a consequence, the squared error 
ratio at Kc, Eq. (A16), is ~ 1 -I- 7^^, never very large 

since Tint,c ^ 1/2, and decreasing with growing L due 
to critical slowing down. On the other hand, in the scal- 
ing region the largest cluster is not dramatically large, 
and a major (static) variance reduction is achieved by 
averaging over the sign of the different clusters at a fixed 
time. This gain is at the level of a single measurement. 
Yet, Eq. (A16), the benefits remains after that the Monte 
Carlo time averaging. 

As for the benefits of including Ssd in the covariance 
reduction procedure, they are marginal at the critical 
point (the CPU gained when adding Bsd to {i?sw, Bqi} 
is less than a 10%). Nevertheless, in the scaling region 
it can pay to consider Bsb- For instance, in a L = 512 
lattice at K = 0.42, where ^ ~ 12, we obtain a CPU gain 
factor of 1.23 for the cluster estimator of the susceptibil- 
ity, and 1.6 factor for the energy. 
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APPENDIX A: ON CLUSTER ESTIMATORS 

We will answer here two related questions: (1) Why 
the control variate i?sw improves so little the cluster es- 
timate of the susceptibility [(32)] and (2) why, at the 
critical point, the susceptibility cluster estimator barely 
improves over the spin one [(29)]. Both questions are spe- 
cific to the Swendsen-Wang dynamics for ferromagnetic 
systems [24]. 

Under a simplifying assumption. Question 1 is ad- 
dressed in the Sect. Al, while Question 2 is considered 
in Sect. A 2. The assumption is that successive measure- 
ments are separated by a time interval of many autocor- 
relation times, so that they are essentially statistically 
independent. The assumption is removed in Sect. A3 
(largely inspired in Ref. [26]). Yet, the static variance 
ratio computed in Sect. A 2 still plays a prominant role 
in the general case. 



1. Bsw for independent measurements 



V. CONCLUSIONS 

For any problem amenable to a path-integral for- 
mulation there are well known strategies (Schwinger- 
Dyson [8], Callen [9], etc.) to obtain identities, that 
imply the vanishing of a precise linear combination of 
expectation values of non-trivial observables. More often 
than not, researchers performing Monte Carlo simula- 
tions compute the quantities appearing in the identities, 
since the extra CPU costs is negligible and the identities 
provide important consistency tests. In particular, they 
allow to detect easily problems as frightening as program- 
ming bugs, failure of the used pseudo random number 
generator, or thermalization bias. What we have pointed 
out here is that, using the general and simple control 
variates strategy [6, 7], these identities provide as well a 
significant error reduction in the final outcome of Monte 
Carlo simulations. This comes at negligible CPU cost. 
The method has been cixemplified in the standard bench- 
mark, the two-dimensional Ising model at criticality. 

We note nevertheless that less trivial applications of 
this technique already exist. In particular, we have found 
that a Schwinger-Dyson equation providing a now stan- 
dard thermalization test in spin-glass simulations [14], 
can gain an error reduction factor of one half on some 
final quantities (e.g. the correlation length) [23]. 



For independent measurements, time correlation func- 
tions, Eq. (11), vanish for all times t ^ 0. Hence, we 
need only to compute a static covariance. 

Let M = i^m(O) be the extensive magnetization (re- 
call Sect. III). At time t in the Swendsen-Wang dynam- 
ics, the lattice will be decomposed in Aft connected com- 
ponents, of size n'. with c = 1,2, . . .A/t (the ordering is 
such that n\ > > rig ... ). All the spins belonging 
to cluster c are given a common sign, 5*. The value 
<S* = ±1 is chosen with 50% probability, independently 
for each cluster c [24]. 

The spin-estimator for is 



nln\,SlSi, . 



(Al) 



On the other hand, if one averages Eq. (Al) over the 2^* 
equivalent choices for the iS* = ±1, only the diagonal 
terms c = c' survive. Hence, the natural cluster estimator 
for {A4^) is the Monte Carlo average of 



c=l 

It is illuminating to write Eq. (Al) as 



(A2) 
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Mi 



4 = Ct- {M") 



rfs = Y. «'SlSl, , Bsw,t = . (A5) 



(A3) 
(A4) 
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Of course, (77^ 



=0, but the statistical indepen- 



dence of the 5* also implies {"q^rig) = 0. Therefore, 

oec = {ill) ' (^M^M'' = iVc) + {Vs) ■ (A6) 

Let us try to improve C using ^sw as control vari- 
ate. We find Ccbs„(0) {vcVs)/^'^ = 0- It follows that 
the improved estimator C* obtained using Bsw con- 
trol variate is just C. Using the language of section II D: 
with no time correlations, the optimal linear combination 
between L'^[m{0)]^ and i"^Ec'^c is just ^"^Ec^c- 



2. The static variance 

Under the independent measurements assumption, the 

(squared) error ratio for the spin [(29)] and cluster [(32)] 
susceptibility estimators equals the static variance ratio 



C 



CcciO) 



(A7) 



To relate with the cluster size distribution, we start 
from Eq. (A3) and a trivial relation between Cj^2^2{0) 
and the dimensionless ratio U4, Eq. (31): 



2\2 



(M2)2 



Ui-1. 



(AS) 



Hence, in the scaling region, where C/4 « 3, the spin esti- 
mator will be remarkably noisier than at Table I. 

The covariance matrix for the ris,r]c can be expressed 
in terms of the nc[28]: 



E-^ -E 



n. 



so that 



R 




Introducing the dimensionless ratios 



9c 



9s 



we note that 



Ui-l = gc + gs , R 



1 + ^. 

gc 



(A9) 
(AlO) 

(All) 

(A12) 
(A13) 



Now, in the paramagnetic scaling region (1 <C ^ <C -L) 
the thermodynamic limit of gs is 2. Indeed, the two 

terms in the difference {rjg) = 2 /(^^ng)^ — ^^n^ 



scale differently: when ^ <C i the first grows as the sys- 
tem volume squared, while the second scales linearly with 



volume[29]. As a consequence, R diverges if one takes 
the largc-L limit at fixed n < Kc- Since the susceptibil- 
ity X — (A^^/L^) remains finite for large L, the error 
incurred when estimating the susceptibility from a single 
measurement, Cj, vanishes in the large- L limit. 

Quite on the contrary, Eq. (A13), considered precisely 
at Kc, strongly suggests that both gc and gs have a finite, 
non vanishing, large-L limit (and hence a finite R). 

We display in Table II our results for gc , and U4 both at 
the critical point and at k = 0.42, where ^ « 12. Indeed 
FI{kc) ~ 1-15 remains bound. As we show in Table II, the 
average ratios n2/ni, n^/ni at Kc arc surprisingly small 
and size independent. In other words, the two sums in 
(AlO) are dominated by m, causing a massive cancela- 
tion that diminish gs as compared to gc- 



3. Monte Carlo time-correlations 

We now drop the assumption of independent measure- 
ments. The (squared) ratio of the errors of the spin and 
susceptibility estimators is no longer R"^, (A7), but 



i?2 



M'-M 



e:=-oo ccc{t) 



(A14) 



Similarly, Eq. (8), the efficiency of Bsw as control variate 
to improve the cluster susceptibility estimator is ruled by 
the correlation coeficient 



rcBsw = 



E*=-(x) ^Cc{t) E*=-(x) C'BswBsw(*) 



(A15) 



Arguing as in Ref. [26] will lead us to our main result: 

1 



rcBsw 



I 



2rint,c 
1 



[R^ + l] 



[2rint,c {B? 



(A16) 
(A17) 



Since R^{kc) ~ 1.3, the efficiency of the cluster esti- 
mator at Kc is ruled by rint.c- Indeed, the (mild) critical 
slowing down can be traced in Table I. The usefulness of 
-Bsw as control variate, Eq. (A17), deteriorates as well 
with growing Tmtfi- 

On the other hand, in the paramagnetic scaling re- 
gion (k < Kc, 1 ^ C ^ -^) one easily has R? ~ 100 
or larger. Given Eq. (AI6), and since Tjnt.c > 1/2 (be- 
cause Ccc{t) > 0, see below), this is due to the large R^ 
that are to be expected, recall Sect. A 2 (we expect rint,c 
to be upper-bounded in the largc-L limit, for k < Kc)- 
However, Eq. (A17), in the scaling region, Bsw behaves 
poorly as a control variate, since Tint.c is lower-bounded 
while i?2 diverges in the large-L limit. 

To derive Eqs. (A16,A17) we first note that (in space 
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K = Kc,y = {D + 2- ri)/2 


L 


gc 


R 




(n2/ni) 


(ns/ni) 


16 


0.11590(43) 


1.19371(35) 


1.00701(60) 


0.12953(41) 


0.06484(21) 


128 


0.12440(62) 


1.16125(51) 


1.00687(84) 


0.12528(49) 


0.06180(27) 


512 


0.12572(62) 


1.15683(41) 


1.00683(93) 


0.12485(52) 


0.06166(29) 


K = 0.42, y = 


16 


0.28942(73) 


1.22644(53) 


137.68(12) 


0.27402(53) 


0.15343(34) 


128 


0.16134(80) 


3.4069(80) 


1016.78(66) 


0.72070(22) 


0.58116(24) 


512 


0.010212(22) 


13.967(15) 


1963.27(60) 


0.82843(14) 


0.73909(15) 



TABLE II: Numerical determinations for different lattice sizes, both at Kc and at k = 0.42 (where ^ ~ 12 for large L), of the 
dimensionless ratios gc and gs, Eq. (A12), and the cluster-estimator's merit number R, Eq. (All), recall also (A7). Note that 
at K = 0.42 the advantages of using a cluster estimator grows fastly with L, while it remains fairly modest at ftc- We show 
as well the product (mL"'^"'"^^'''/^) at Kc, where ni is the largest cluster, D = 2 and q = 1/4 is the anomalous dimension. 
We see that n\ scales as the full sum X^j.nc (indeed (ni)^ < {n\) < (^^ni) = L^X ^ L^'^^~^). On the contrary, at 
K = 0.42, n\ grows only mildly with L. We can also compare for both k the average ratio of the sizes of the second-largest to 
largest cluster (712/^1), and that of third-largest to largest (0,3 /ni). While at Kc there is a L- invariant hierarchical structure 
ni ~ 8n2 ~ 16n3 . . . , at « = 0.42 the largest cluster becomes a typical one with growing L. 



dimension D) 



CcM-{t)~Ccc{t), (A18) 
-2CcM<t) (A19) 



Eq. (18) suggests that it will be fruitful to recall the 
main properties of the operator = Pbond^spin- The 
two operators i-bond and Pgpin are of heat-bath type, and 
their action is quite simple [24]: for any observable O, 
PspinO = E{0\{b}) and PbondO = E{0\{S}). In partic- 
ular. we have 



C , P^pinC = C , PbondM' = . (A20) 



All heat-bath operators, P , share some nice fea- 
tures: they are self-adjoint, (Oi, P^Oz) = (P™Oi, O2) 
and idempotent [pHBj2_pHB Furthermore, they pre- 
serve expectation values (O) = {P™0) [30]. 

Combining P^"^ = PbondPspin with [Pspin]^ = Pspin 

(hence [Pbo„dPspi„]*>° = [P'^^]*Pspin) and with the self- 
adjointedness of Pgpin and Pbond, we get for t > 



P ■ 

-'spin-'''' 



(^2^ jpSWjt^2) ^ (Al2jpSW^*J 

= (^l^[psw]*c), 

= (PspinPbondA^^ [pSW]t-lC) , 

= (C, [pS^^y-^C) , (A21) 



(7^2, [PS^]*C) = (C, [pS^]*-iC) , (A22) 
(C, [PSW]*7W2-) ^ (C,[pSW]tc). (A23) 



Now, Eqs. (A21,A22,A23) tell us that {6tfi stands for 
Kronecker's delta, and we assume t > 0) 



CM^M^{t) — <5t,oCAi2_A42 (0) (A24) 
+ (l-'5t,o)Ccc(t-l), 
CcM<t) = KoCcdO) (A25) 

+ (l-^t,o) ^ • 



Deriving at this point Eqs. (A16,A17) is straightforward. 
We note, finally, that 

(C, [PSW]*C) = (C, [PspinPbo„dPspin]*C) , (A26) 

which implies that Ccc{i) > 0, and hence Tint,c > 1/2. 
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